Remote I/O Optimization and Evaluation for Tertiary Storage Systems through Storage Resource Broker

نویسندگان

  • Xiaohui Shen
  • Wei-keng Liao
  • Alok Choudhary
چکیده

Large-scale parallel scientific applications are generating huge amounts of data that tertiary storage systems emerge as a popular place to hold them. SRB, a uniform interface to various storage systems including tertiary storage systems such as HPSS, UniTree etc., becomes an important and convenient way to access tertiary data across networks in a distributed environment. But SRB is not optimized for parallel data access: one SRB I/O call to storage systems must access a contiguous piece of data just like UNIX I/O. For many access patterns, this results in numerous small I/O calls which are very expensive. In this paper, we present a run-time library (SRB-OL) for optimizing tertiary storage access on top of SRB low level I/O functions. SRB-OL extends various state-of-the-art I/O optimizations that could be found in secondary storage systems to a remote data access environment via SRB. We also present a novel optimization scheme: superfile that can deal with large amounts of small files efficiently. We also incorporate a subfile technique and other features in SRB such as container, migrate, stage and purge into our SRB-OL. How to use these optimizations is decided by a Meta-data Management System (MDMS) [7] that resides one level above SRB-OL. The user provides access pattern information/hints through user application to MDMS, and then MDMS uses these hints to choose an optimal I/O approach and passes the decision to SRB-OL. Finally, SRB-OL performs optimized SRB I/O calls to access data residing on tertiary storage systems. To give a quantitative view of optimized SRB I/O functions, we propose a performance model based on significant I/O experiments. By using this performance model, we can prove that collective I/O, superfile etc have significant performance improvements. In addition, we present an I/O Performance Predictor that can estimate I/O cost before the user actually carries out her experiment. This provides the user a lot of benefits for running her application.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

I/O Optimization and Evaluation for Tertiary Storage Systems

Large-scale parallel scientific applications are generating huge amounts of data that tertiary storage systems emerge as a popular place to hold them. SRB, a uniform interface to various storage systems including tertiary storage systems such as HPSS, UniTree etc., becomes an important and convenient way to access tertiary data across networks in a distributed environment. But SRB is not optimi...

متن کامل

Eecient Buuering for Concurrent Disk and Tape I/o

Tertiary storage is becoming increasingly important for many organizations involved in large-scale data analysis and data mining activities. Yet database management systems (DBMS) and other data-intensive systems do not incorporate tertiary storage as a rst-class citizen in the storage hierarchy. For instance, the typical solution for bringing tertiary-resident data under the control of a DBMS ...

متن کامل

A High-Performance Cluster Storage Server

An essential building block for any Data Grid infrastructure is the storage server. In this paper we describe a high-performance cluster storage server built around the SDSC Storage Resource Broker (SRB) and commodity workstations. A number of performance critical design issues and our solutions to them are described. We incorporate pipeline optimizations into SRB to enable the full overlapping...

متن کامل

A Simple Mass Storage System for the SRB Data Grid

The functionality that is provided by Mass Storage Systems can be implemented using data grid technology. Data grids already provide many of the required features, including a logical name space and a storage repository abstraction. We demonstrate how management of tape resources can be integrated into data grids. The resulting infrastructure has the ability to manage archival storage of digita...

متن کامل

Secure and Efficient Client and Server Side Data Deduplication to Reduce Storage in Remote Cloud Computing Systems

Duplication of data in storage systems is becoming increasingly common problem. The system introduces I/O Deduplication, a storage optimization that utilizes content similarity for improving I/O performance by eliminating I/O operations and reducing the mechanical delays during I/O operations and shares data with existing users if Deduplication found on the client or server side. I/O Deduplicat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001